-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimized deletion for Trie/TrieMap #525
Conversation
Extracting a helper function that does not access the closure.
Co-authored-by: Claudio Russo <[email protected]>
src/Trie.mo
Outdated
public func remove<K, V>(trie : Trie<K, V>, key : Key<K>, equal : (K, K) -> Bool) : (Trie<K, V>, ?V) { | ||
func rec(trie : Trie<K, V>, bitPosition : Nat) : (Trie<K, V>, ?V) { | ||
switch trie { | ||
case (#empty) { (#empty, null) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks plausible to me, but in
#427 (comment)
Joachim warns against a broken path compression optimization in filter or map. Does that warning apply here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. The path cannot be compressed for branch nodes because the bit positions matter (as Joachim correctly mentioned). This is why I only collapse the leaf nodes (with an empty sibling or another sibling leaf). Or do I miss something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. I wonder if you can delete a branch if both trees are empty. Can you wind up with nested branches that are actually empty at their leaves or does the recursion eliminate that coming back up the tree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added collapsing two empty leaves (since leaf()
function also returns an empty node for an empty leaf).
After deleting all elements in a 1000_000 trie, the trie is again empty (no branches contained any longer).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, but would be nice if @matthewhammer could take a look.
@luc-blaeser I do not follow this claim. The existing logic, as with the new logic proposed here, uses association lists. Those lists then must do the Looking at the existing
So we see that in the subcase for I realize that this PR is doing some orthogonal improvements around this operation, but I remain very confused by the claim that the existing logic does not actually release memory (after GC over a |
As replace with null is deleting in AssocList
Thanks Matthew for pointing this out. You are right, I found a mistake in my benchmark case (deleting less than allocating). I am very sorry for this confusion. So, after re-measuring (hopefully correct this time), the benchmark shows that a majority of the memory can be reclaimed after deleting the entries in the Measurement results, creating 1_000_000 Heap size
The question is on whether we should consider this PR or close it for the moment. |
@luc-blaeser Okay, I am relieved. No worries! Before you mentioned finding an actual measurement-related bug, I started wondering if the Trie is just so wasteful (rebuilding those little lists in a pure way) that it seemed like there was a leak, as you alluded to today in the meeting. FWIW, I apologize for the
In addition to the space trimming for Both of those improvements seem valuable to me! BTW, I didn't consider these cases for compressing subtrees when I was "fixing" the earlier, broken way that Now, if (isEmpty(fl) and isEmpty(fr)) {
#empty
} else {
branch(fl, fr)
} But it's missing some compression opportunities when one subtree is a leaf and the other is null. Your logic in the PR could expand to improve |
The space savings (and more tests) still look worthwhile to me, so I'd suggest continuing with the PR or merging in its current state (perhaps with TODO to improve filter and mapFilter (if we have that).. Is it possible to get to the situation where removing all the elements from a tree gets you back to #empty or is that just too expensive to arrange? I'm just curious, not necessarily suggesting we reach for that goal. |
I tested it, and the tree is actually empty again after the removal of all entries. I will also look at the Update: The empty node collapsing logic has been applied to |
/// trie := Trie.put(trie, key "test", Text.equal, 1).0; | ||
/// trie := Trie.replace(trie, key "test", Text.equal, 42).0; | ||
/// assert (Trie.get(trie, key "hello", Text.equal) == ?42); | ||
/// ``` | ||
public func replace<K, V>(t : Trie<K, V>, k : Key<K>, k_eq : (K, K) -> Bool, v : ?V) : (Trie<K, V>, ?V) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this would produce a little less garbage by communicating the previous value using private state and having rec
simply return a trie, not a pair, as in RedBlackTree.insert/replace (or what ever it is called). Maybe not worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. I implemented this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Just made some minor suggestions to remove parens and perhaps optimize replace, but all optional.
There's probably similar spurious parens in patterns in the rest of the file, but I can't make GH suggestions on unchanged lines. Probably not worth fixing.
Thanks a lot for all these valuable suggestions. Would you like to push the parentheses optimization? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM apart from the redundant assignment.
Co-authored-by: Claudio Russo <[email protected]>
Update: The current
AssocList.replace()
withnull
is indeed deleting.This PR only optimizes the
Trie
andTrieMap
delete functionality that it can free more memory.Leaves (but not the branches) are recursively collapsed along the delete path in the trie.
TrieMap
, implicitly also testingTrie
.